First approach to the selection of lexical units for continuous speech recognition of Basque

نویسندگان

  • Karmele López de Ipiña
  • M. Inés Torres
  • Lourdes Oñederra
  • Amparo Varona
  • Nerea Ezeiza
  • Mikel Peñagarikano
  • M. Hernández
  • Luis Javier Rodríguez-Fuentes
چکیده

The selection of appropriated Lexical Units is an important issue in the Language Model (LM) generation. Word has been used classically as unit in most of the Continuous Speech Recognition systems. However, during the last years proposals of non-word units have begun to appear. Since Basque is an agglutinative language with a certain structure inside the word, the nonword units could be an adequate option. In this work, a statistical analysis of the morphological structure of Basque has been carried out. This analysis shows a slight increment of the rates of confusion in Continuous Speech Recognition Systems due to the great increment of acoustically similar and short units. Finally several proposals of Lexical Units are analyzed to deal with the problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Morphological Segmentation for Continuous Speech Recognition of Basque

The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Word has been used classically as unit in most of them. However, proposals of non-word units have begun to arise. Since the subject of this study is the Basque language, which is an agglutinative language with a complex structure inside words, non-word units ...

متن کامل

Selection of Lexical Units for Continuous Speech Recognition of Basque

The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Words have been used classically as the recognition unit in most of them. However, proposals of nonword units are beginning to arise. Basque is an agglutinative language with some structure inside words, for which non-word morpheme like units could be an appr...

متن کامل

Selection of sublexical units for continuous speech recognition of basque

This paper describes the work carried out to select the most suitable set of Sublexical Units for Continuous Speech Recognition of Basque. Even if there are several dialects in Basque, only one of them has been used to choose the preliminary set of sounds. Bearing in mind this aim, a wide experimentation has been carried out to select Context Independent Phone-Like Units. Then, in order to obta...

متن کامل

Decision Tree-Based Context Dependent Sublexical Units for Continuous Speech Recognition of Basque

This paper presents a new methodology, based on the classical decision trees, to get a suitable set of context dependent sublexical units for Basque Continuous Speech Recognition (CSR). The original method proposed by Bahl [1] was applied as the benchmark. Then two new features were added: a data massaging to emphasise the data and a fast and efficient Growing and Pruning algorithm for DT const...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000